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Abstract — Consider  constrained-deadline  sporadic  tasks 
scheduled  on  a  multiprocessor  where  (i)  each  task  is 
characterized  by  its  execution  requirement,  deadline,  and 
minimum  inter-arrival  time,  (ii)  each  task  generates  a  sequence 
of  jobs,  (iii)  the  execution  requirement  of  a  job  and  its  potential 
for  parallel  execution  is  described  by  one  or  many  stages  with  a 
stage  having  one  or  many  segments  and  different  segments  in  the 
same  stage  can  execute  in  parallel  and  a  segment  is  only  allowed 
to  start  execution  if  all  segments  of  previous  stages  have  finished 
execution,  and  (Iv)  there  is  contention  for  shared  resources 
in  the  memory  system  (cache  eviction,  reordering  in  memory 
controller,  memory  bus  contention).  We  present  an  algorithm 
that  (i)  performs  schedulabillty  testing  for  tasks  scheduled 
with  global-Earllest-Deadllne-First  (gEDF),  (li)  configures  the 
virtual-to-physical  address  translation  so  that  a  cache  block 
fetched  to  the  last-level  cache  by  one  task  cannot  be  evicted 
by  another  task,  (iii)  configures  the  vlrtual-to-physical  address 
translation  to  attempt  to  eliminate  the  extra  execution  time 
caused  by  the  reordering  effect  in  the  memory  controller  and 
if  this  is  not  possible,  then  the  reordering  effect  is  considered 
in  the  schedulabillty  analysis,  and  (Iv)  considers  the  effect 
of  contention  for  the  memory  bus.  Our  solution  is  based  on 
formulating  this  problem  as  a  Mixed-Integer  Linear  Program 
(MILP).  We  have  implemented  a  tool  based  on  this  theory  and 
validated  its  output  against  measurements  on  a  real  computer. 

I.  Introduction 

Multicore  processors  are  the  norm  today.  The  trend  is  that 
the  number  of  processors  on  a  chip  increases  exponentially 
while  the  clock  frequency  stays  constant.  And  software  prac¬ 
titioners  are  under  pressure  to  deliver  improved  functionality 
which  has  increased  the  execution  requirements.  This  trend 
makes  it  increasingly  common  in  real-time  systems  that  a  job 
has  execution  requirement  so  large  that  executing  it  sequen¬ 
tially  causes  a  deadline  miss  and  hence,  the  only  way  for  a  job 
to  meet  its  deadline  is  to  perform  some  execution  in  parallel. 
Some  software  is  inherently  sequential,  however,  so  a  software 
system  typically  consists  of  parts  that  can  execute  in  parallel 
and  parts  that  cannot.  This  brings  the  challenge: 

Cl.  Schedule  software  where  some  parts  can  execute 
in  parallel  so  that  all  deadlines  are  met  and  prove 
before  run-time  that  their  deadlines  are  met. 

Timing  of  software  executing  on  a  COTS  multicore  processor 
depends  not  only  on  the  processor  scheduler  but  also  on 
contention  for  shared  resources  in  the  memory  system.  This 
includes  (i)  the  last-level  cache  shared  between  processors, 
(ii)  the  row  buffer  in  each  memory  bank  storing  the  most 
recently  accessed  row,  and  (iii)  the  memory  bus  (the  bus 
between  the  memory  controller  and  DRAM  memory  modules). 
A  cache  memory  is  typically  organized  as  a  set  of  cache  sets 
where  certain  bits  of  the  physical  address  of  a  memory  access 


determine  which  cache  set  the  memory  access  should  use. 
Hence,  if  the  virtual-to-physical  address  translation  is  set  up 
so  that  for  the  physical  addresses  generated,  it  holds  that  no 
two  memory  accesses  of  different  tasks  use  the  same  cache 
set,  then  it  is  guaranteed  that  a  cache  block  fetched  to  the 
cache  by  one  task  cannot  be  evicted  by  another  task.  Also, 
DRAM  memories  are  typically  organized  as  a  set  of  banks 
with  each  bank  having  multiple  rows  and  each  bank  having  one 
row  buffer  which  stores  the  data  of  the  most  recently  accessed 
row.  When  a  memory  access  experiences  a  miss  in  the  shared 
cache,  (i)  precharging  is  performed,  that  is,  the  data  in  the 
row  buffer  is  written  back  to  its  row  in  the  memory  bank  and 
then  (ii)  the  memory  access  activates  a  row  in  a  memory  bank 
(the  memory  bank  is  indicated  by  certain  bits  in  the  physical 
address  of  the  memory  access  and  the  row  is  indicated  by  other 
bits)  so  this  row  is  loaded  in  the  row  buffer  of  the  memory  bank 
and  then  (iii)  the  memory  access  reads  data  from  this  buffer 
and  transfers  the  data  to  the  processor  (if  the  memory  access  is 
a  load)  or  writes  data  to  this  row  buffer  (if  the  memory  access 
is  a  store).  If  the  row  needed  for  a  memory  access  is  already 
loaded  in  the  row  buffer,  then  precharge  and  activate  are 
not  performed  and  hence  execution  is  faster.  For  this  reason, 
memory  controllers  reorder  memory  accesses  so  that  memory 
accesses  to  the  row  that  is  in  the  row  buffer  get  ahead  in  certain 
queues  in  the  memory  controller.  Consequently,  a  memory 
access  can  be  delayed  because  other  memory  accesses,  of  other 
tasks,  get  ahead  in  the  queue  (reordering  effect).  Hence,  if  the 
virtual-to-physical  address  translation  is  set  up  so  that  for  the 
physical  addresses  generated  by  tasks,  it  holds  that  no  two  of 
them  access  the  same  bank,  then  it  is  guaranteed  that  no  task 
can  suffer  from  this  reordering  effect.  In  addition,  a  memory 
access  can  also  be  delayed  because  other  accesses  use  the 
memory  bus.  This  brings  the  challenges: 

C2.  Configure  the  virtual-to-physical  address  trans¬ 
lation  so  that  a  cache  block  fetched  to  the  last-level 
cache  by  one  task  cannot  be  evicted  by  another  task. 

C3.  Configure  the  virtual-to-physical  address  trans¬ 
lation  so  that  reordering  of  memory  accesses  from 
different  tasks  are  avoided  and  if  they  do  occur,  then 
the  schedulability  analysis  computes  an  upper  bound 
on  the  extra  execution  time  due  to  reordering. 

C4.  Compute  an  upper  bound  on  the  extra  execution 
time  caused  by  processors  sharing  the  memory  bus. 

The  research  literature  offers  solutions  for  each  of  these 
challenges  (see  Table  I).  Unfortunately,  the  research  literature 
offers  no  solution  for  all  these  challenges. 

Therefore,  in  this  paper,  we  present  a  solution  for  all 
these  challenges.  We  assume  global-EDF  (gEDF)  is  used  and 
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consider  a  previously  known  [1]  schedulability  test  for  it;  we 
choose  gEDF  and  this  schedulability  test  because  among  the 
schedulers  and  schedulability  tests  available  for  tasks  with 
potential  parallelism,  gEDE  with  our  chosen  schedulability 
test  offers  the  best  performance  bound  [1].  We  reformulate 
this  schedulability  test  as  a  Mixed-Integer  Linear  Program 
(MILP)  and  extend  this  formulation  so  that  it  (i)  configures 
the  virtual-to-physical  address  translation  so  that  a  cache  block 
fetched  to  the  last-level  cache  by  one  task  cannot  be  evicted 
by  another  task,  (ii)  configures  the  virtual-to-physical  address 
translation  to  attempt  to  eliminate  the  extra  execution  time 
caused  by  the  reordering  effect  in  the  memory  controller,  and 
if  this  is  not  possible,  the  reordering  effect  is  considered  in 
the  schedulability  analysis,  and  (iii)  considers  the  effect  of 
contention  for  the  memory  bus. 

The  remainder  of  this  paper  is  organized  as  follows. 
Section  II  presents  the  system  model  we  use.  Section  III  adapts 
a  previously  known  schedulability  test  for  gEDF  to  a  MILP 
formulation.  Section  IV  presents  additional  constraints  that 
express  an  upper  bound  on  the  execution  time  of  a  segment 
due  to  memory  contention  and  how  it  depends  on  memory 
mapping,  and  also  express  other  constraints.  Section  V  puts 
it  all  together  as  a  solution  for  all  the  four  challenges.  Then 
follow  discussions  and  conclusions. 

IT  System  Model 

Fig.  1  illustrates  the  model  we  consider.  We  consider  a 
system  with  (i)  a  computer  with  m  processors  of  speed  s  and 
(ii)  a  software  system  described  as  a  taskset  t.  A  task  in 
T  is  characterized  by  Ti,  Di,  nstagesi,  nsegij,  and  with 
the  interpretation  that  ti  generates  a  sequence  of  jobs  where 
the  arrival  times  of  two  consecutive  jobs  of  Ti  are  separated 
by  at  least  Ti  and  a  job  of  Ti  needs  to  finish  execution  by  the 
absolute  deadline  of  the  job  (the  absolute  deadline  of  a  job  of 
Ti  is  Di  time  units  after  its  arrival)  and  execution  requirement 
is  described  with  stages  where  nstageSi  denotes  the  number 
of  stages  of  a  job  of  Ti  and  nsegij  denotes  the  number  of 
segments  of  the  stage  of  a  job  of  r^.  Let  j  denote 
an  upper  bound  on  the  execution  requirement  of  a  segment 
of  the  stage  of  Ti  (explained  later  in  this  section).  A  job 
executing  contiguously  for  A  time  units  performs  A  x  s  units 
of  execution.  We  assume  Vt^  €  t  :  Di  <  Ti  —  such  tasksets 
are  called  constrained-deadline  sporadic  tasksets. 

When  a  job  of  task  arrives,  all  the  nsegi  i  segments 
of  the  1®*  stage  of  task  Ti  become  eligible  for  execution.  For 
each  j  >  2,  at  the  time  when  all  the  nsegi j_i  segments  of  the 
(j  —  1)**'  stage  of  task  Ti  have  finished,  all  the  nsegij  segments 
of  the  stage  of  task  Ti  become  eligible  for  execution.  A 
segment  becomes  non-eligible  when  it  has  finished  execution. 


A  job  of  task  Ti  finishes  when  all  the  nsegi^nstagesi  segments 
of  the  nstageSi**'  stage  of  this  job  have  finished. 

gEDF  assigns  high  priority  to  jobs  with  early  absolute 
deadlines  and  a  segment  inherits  the  priority  of  the  job  it 
belongs  to.  At  each  instant,  if  at  most  m  segments  are  eligible 
for  execution  at  this  instant,  then  all  of  them  execute  at  this 
instant;  if  m  -I-  1  or  more  segments  are  eligible  for  execution, 
then  the  m  highest  priority  segments  at  this  instant  are  selected 
for  execution  at  this  instant.  A  taskset  t  is  gEDF  schedulable 
on  a  computer  with  m  processors  of  speed  s  if  for  each  jobset 
that  T  can  generate,  for  each  schedule  that  gEDF  can  generate 
for  this  jobset,  it  holds  that  all  deadlines  are  met. 

Each  segment  of  a  stage  of  a  task  has  a  virtual  address 
space.  The  virtual  address  space  is  organized  into  pages  of 
size  PAGESIZE.  (For  example,  for  x86,  PAGESIZE=4096 
bytes.)  The  memory  footprint  of  a  segment  of  the  stage 
of  Ti  is  at  most  npj  ^  pages.  Each  page  is  associated  with 
a  range  of  virtual  addresses.  A  virtual  address  is  mapped 
to  a  physical  address  as  follows.  The  log2  PAGESIZE  least 
significant  bits  of  the  virtual  address  are  copied  to  the  least 
significant  bits  of  the  physical  address.  (These  least  significant 
bits  of  the  physical  address  are  called /rame  ojfset.)  The  other 
bits  of  the  virtual  address  are  called  page  index  and  these  are 
translated  to  a  frame  index  and  the  bits  of  the  frame  index 
are  copied  to  the  most  significant  bits  of  the  physical  address, 
sharedframes  denotes  a  set  of  8-tuples  so  that  for  each  8- 
tuple  (i' ,  j' ,  g' ,p' ,i'' ,  j” ,  g” ,p'')  it  is  required  that  page  p'  of 
the  segment  of  the  stage  of  tt  is  mapped  to  the  same 
frame  as  page  p”  of  the  segment  of  the  stage  of  Tin . 
We  assume  that  for  such  8-tuples  Ti'  =  Ti"  (because  otherwise 
cache  coloring,  as  we  will  see,  does  not  work). 

In  our  previous  work  [13],  we  presented  and  validated 
a  model  of  the  memory  system  of  typical  COTS  multicore 
processor  based  systems.  In  this  paper,  we  use  a  model 
that  improves  on  that  model  by  having  a  more  fine-grained 
description  of  memory  accesses.  Our  model  is  as  follows. 
The  last-level  cache  (LLC)  is  shared  between  processors.  This 
cache  is  organized  as  a  set  of  cache  sets  where  certain  bits 
in  the  physical  address  determine  which  cache  set  a  memory 
access  is  associated  with.  Some  of  these  bits  are  part  of  the 
frame  index  and  some  are  part  of  the  frame  offset.  When  a 
memory  access  experiences  a  miss  in  LLC,  the  memory  access 
is  passed  on  to  the  memory  controller  and  identifies  which 
memory  bank  the  memory  access  is  associated  with  (certain 
bits  in  the  frame  index  determine  this)  and  which  row  in  this 
memory  bank  it  is  associated  with  (other  bits  in  the  frame 
index  determine  this)  and  it  is  inserted  in  a  queue  for  memory 
accesses  to  this  memory  bank.  The  queuing  discipline  First- 
Ready-First-In-First-Out  (FR-FIFO)  is  used.  With  this  queuing 
discipline,  FIFO  is  used  but  with  the  following  exceptions  (i)  a 
memory  access  can  be  prevented  from  being  performed  at 
certain  instants  because  there  are  DRAM  timing  parameters 
which  state  that  a  certain  part  of  a  DRAM  access  must  wait 
until  a  certain  timing  requirement  (based  on  previous  memory 
accesses)  is  satisfied  and  (ii)  elements  in  the  queue  of  a 
memory  bank  can  be  reordered  so  that  a  memory  access  gets  to 
the  head  of  the  queue  when  this  memory  access  is  associated 
with  the  row  that  is  currently  loaded  in  the  row  buffer.  When 
a  memory  access  gets  to  the  head  of  the  queue  of  the  memory 
bank,  it  contends  for  the  memory  bus  with  memory  accesses 
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Fig.  1:  The  model  we  consider. 


of  other  memory  banks.  When  a  memory  access  is  granted 
the  memory  bus,  the  memory  access  precharges  its  associated 
memory  bank  (that  is,  the  data  in  the  row  buffer  is  written 
back  to  its  row  in  the  memory  bank)  and  then  the  memory 
access  activates  its  associated  row  in  its  associated  memory 
bank  (that  is,  the  data  in  this  row  is  loaded  to  the  row  buffer) 
and  finally  it  transfers  data  (from  the  row  buffer  of  the  memory 
bank  to  the  memory  controller  if  the  memory  access  is  a  load; 
the  other  direction  if  it  is  a  store).  If  the  row  associated  with 
the  memory  access  is  already  in  the  row  buffer  then  precharge 
and  activate  are  not  performed. 

C'ij  (map)  denotes  an  upper  bound  on  the  execution  re¬ 
quirement  of  a  segment  in  the  stage  of  Ti  for  the  case 
that  this  segment  does  not  experience  contention  for  resources 
in  the  memory  system  from  other  segments  and  map  is  the 
memory  mapping  of  all  tasks  in  the  system.  MA^j- p(map) 
denotes  an  upper  bound  on  the  number  of  memory  accesses 
reaching  the  memory  controller  of  page  p  of  a  segment  in 
the  stage  of  Ti  for  the  case  that  this  segment  does  not 
experience  contention  for  resources  in  the  memory  system 
from  other  segments  and  map  is  the  memory  mapping  of  all 
tasks  in  the  system. 

In  today’s  processors,  typically,  the  bits  in  the  physical 
address  from  which  the  cache  set  index  of  LLC  is  obtained 
overlaps  with  the  bits  that  determine  the  frame  index  (see 
[28]).  Also,  in  today’s  processors,  typically,  the  bits  in  the 
the  physical  address  from  which  the  bank  index  is  obtained 
overlaps  with  the  bits  that  determine  the  frame  index  (see 
[28]).  Therefore,  one  can  partition  memory  frames  of  physical 
memory  into  cache  colors  so  if  two  memory  accesses  belong  to 
different  frames  and  these  two  memory  frames  belong  to  two 
different  cache  colors,  then  it  holds  that  one  memory  access 
cannot  evict  a  cache  block  fetched  to  LLC  by  the  another  mem¬ 
ory  access.  One  can  also  partition  memory  frames  of  physical 
memory  into  bank  colors  so  if  two  memory  accesses  belong  to 
different  frames  and  these  two  memory  frames  belong  to  two 
different  bank  colors,  then  it  holds  that  one  memory  access 


cannot  evict  a  row  in  a  memory  bank  that  another  memory 
access  has  loaded.  H  denotes  the  number  of  cache  colors  and 
B  denotes  the  number  of  bank  colors.  MEMCAP  denotes 
the  amount  of  physical  memory  in  the  computer,  measured  in 
the  number  of  frames.  Some  recent  multicore  chips  use  sliced 
LLC;  such  chips  prevent  us  from  using  100%  of  the  main 
memory  when  using  cache  partitioning  [14].  For  this  reason, 
let  HWSHARE  denote  the  share  of  physical  memory  that  we 
may  use.  Let  CAP  =  HWSHARE  x  MEMCAP/(iT  x  B). 
(In  a  typical  x86  computer  today,  MEMCAP  =  2^^/4096  = 
2^^,H  =  32,B  =  16,  HWSHARE  =  1/4  and  this  yields 
CAP  =  (1/4)  X  217(32  X  16)  =  28) 


Let  Cij  be  a  value  such  that  Vmap  :  Cij  (map)  <  Cij. 
Let  MAij^p  be  a  value  such  that  Vmap  :  MAij^p(map)  < 
MAi  j  p.  In  practice,  if  the  memory  mapping  map  is  known, 
then  it  is  possible  to  obtain  Q  j(map)  and  MA^j  p(map)  (e.g. 
using  a  worst-case  execution-time  analysis  tool)  but  obtaining 
Cij  and  MAi  jp  is  very  expensive  because  they  describe 
behavior  of  the  software  for  all  possible  memory  mappings 
of  the  system.  Even  if  Cij  and  MA^  ^  p  are  obtained,  it  can 
happen  that  our  algorithm  selects  an  abstract  memory  mapping 
o  and  we  choose  a  memory  mapping  map  that  is  compatible 
with  o  and  that  Cij  is  much  higher  than  Cij  (map)  (and 
analogously  for  MAi  j  p(map)).  This  would  result  in  large 
pessimism.  We  will  discuss  (in  Section  VI)  how  to  deal  with 
these  issues.  For  now,  assume  Cij  and  MA^  j  p  are  known. 

We  assume  (as  do  many  previous  studies  [31,  22,  19])  that 
a  processor  is  stalled  when  it  waits  for  memory.  We  use  some 
notation  from  [13],  namely,  the  following; 


Winter  =  rnax(tFtFtD,  tpAW  —  3  X  (rrd)  X  tcK 
=  max(WL  -|-  BL/2  -|-  twTR,  CL  +  BL/2  -I-  2  -  WL)  x  tcjc 

r  _  rPRE  ,  j ACT  ,  , RW 

Rinter  —  ^inter  +  +  Ljnter 


Lcont  =  tRP  -I-  iRCD  + 

max(CL  +  BL/2  -|-  2,  WL  -|-  BL/2  -|-  max(twTR,  (wr))  X  tcK 
Lconhit(a:)  =  (1^/21  X  (WL  -|-  BL/2  -|-  (wtr)  + 

[a:/2j  X  CL  -t-  max(iwR  —  ^wtr))  X  icK 


(8) 

(9) 

(10) 

(11) 

(12) 

(13) 
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Ci  ^  (nsegij  X  Ci. 


nstages^ 

v.=  E  (r^^ixc.., 
.t!  V  m 


(1) 


0  if  t  <  0 


det  Cij  nseg. 


WJ(ri,t,s)  =  W  WJS(i,t,  l,s)  ifO<t<^  bsp.  .  =  X  [ - 

Ci  if  ^  <  t  ^  ® 


det  Ci,,  nseg;  . 
sPi.i  =  X  r - (2) 


t  X  m  X  s  ifO<t<  bsp^  j 

WJS('i,  t,  j,  s)  ^  bsp^  j  X  m  X  s  +  (t  —  bsp^^^- )  x  (nseg^  j-  mod  m)  X  s  if  bsp^  ^  <  f  <  sp^ 

C-i,j  X  nseg^  •  +  WJS(f,  t  —  sp^  • ,  j  +  1,  s)  if  sp^  j  <  t 


(3) 


ffdbf(Ti,f,  i;,s)  X  Ci-\-Ci  -  WJ(Ti,  (D^  -  (f  mod  T^))  x  s)  (4) 

h(T,  m,  s,  cr,  f)  {  \  ffdbf  (r^ ,  t,  —  ,  s)  <  {{m  —  {m  —  l)x— )xfxs))  (5) 


f(T,  m,  s)  (3cr  such  that  (tr  >  max  such  that  t  >  0  :  h(r,  m,  s,  tr,  f)))  (6) 


f(r,  m,  s)  ^  r  is  gEDF  schedulable  on  a  computer  with  m  processors  of  speed  s  for  the  case  that  tasks  do  not  experience  memory  contention  (7) 


Fig.  2:  Previously  known  schedulability  analysis  for  gEDF  scheduling  of  parallel  tasks. 


The  first  is  the  time  required  for  precharge;  the  second  is  the 
time  required  for  activate;  the  third  is  the  time  for  data  transfer. 
Lconf  denotes  row-conflict  service  time  and  Tconhit(a^)  is  a 
function  which  describes  the  time  it  takes  to  serve  x  consec¬ 
utive  memory  accesses  to  the  same  row  in  the  same  memory 
bank  if  the  row  was  already  activated.  These  parameters  can 
be  obtained  from  DRAM  datasheets  —  see  [13].  P  denotes  the 
least  common  multiple  of  values.  DMAX  =  maxT^gT  Di. 
N^e  is  a  parameter  which  describes  the  limit  that  the  hardware 
imposes  on  reordering  (to  be  discussed  later).  We  define 
UBNOMR  =  Er'(“iax^'g[i_„stagesp]  seg^,  ^,)  meaning  upper 
bound  on  the  number  of  outstanding  memory  requests.  We  also 
define;  LIMl  =  min(m  -  1,  UBNOMR  -  1)  and  LIM2  = 
min(m  -  1,  UBNOMR  -  1)  -f  Ee. 

Tasks  typically  perform  execution  and  access  memory  in 
an  initialization  phase  which  does  not  have  real-time  require¬ 
ments.  This  execution  and  memory  accesses  are  not  considered 
as  a  job  but  the  pages  accessed  needs  to  be  mapped  to  memory 
frames.  Therefore,  INO  indicates  the  the  number  of  pages 
accessed  during  initialization. 


III.  Schedulability  analysis  lor  gEDF  ol 

PARALLEL  TASKS 

Previous  work  [1]  provided  a  schedulability  test  for  this 
problem  for  the  special  case  that  contention  for  resources 
in  the  memory  system  does  not  occur.  Fig.  2  shows  this 
schedulability  test.  We  will  now  discuss  how  to  modify  this 
schedulability  test  slightly  and  then  rewrite  as  MILP. 

Let  us  choose  a  value  of  K  that  is  a  positive  integer  (e.g. 
K  =  20).  In  the  schedulability  test  expressed  by  Eig.  2,  check 
only  those  a  such  that  there  is  a  k  G  {1,2,  ...,Ar}  such  that 
cr  =  (k/K)  X  s.  This  yields  the  following  schedulability  test 


with  a  slight  increase  in  pessimism: 

h*  (r,  m,  s,  k,  K,  t)  (  y  '  ffdbf  (r^  —  ,  s)  <  (m  —  (m  —  l)x  —  )  X  t  X  s) 

Ti^T 

f *  (r,  m,  s,  K)  (3k  G  {1,2,  .  .  .  ^  K}  such  that  ( — — —  >  max 

(Vi  such  that  t  >  0  :  h*  (r,  m,  s,  k,  K,  t))) 

f  (r,  m,  s,  K)  T  is  gEDF  schedulable  on  a  computer  with  m  processors 
of  speed  s  for  the  case  that  tasks  do  not  experience  memory  contention 

Clearly,  t  =  [i/P\  x  P  t  mod  P.  Thus 

E..e.ffdbf(r.,f,f,s)  =  [t/P\  X  Px  (E..6.0/h)  + 
Er  ex ®dbf(Ti, f  modP,  ;|=,s).  Consequently  when 
evaluating  (Vf  such  that  f  >  0  :  h*(r,  m,  s.  A:,  K,t)),  it 
is  only  necessary  to  consider  values  of  t  that  are  at  most 
P.  Hence,  i*  {T,m,  s,  K)  is  true  if  and  only  if  there  is  an 
assignment  of  values  satisfying:  EfcLi 

Vfc  G  [1,  K]  :  wife  G  {0,  1} 

s  X  k  X  /~) ' 

V(i,  k)  such  that  (r^  G  t)  A  {k  G  [1,  K])  :  (wife  ^  1)  ^  [r)i  <  - — - -) 

Vfc  such  that  k  G  [1,  K]  :  (wi^  =  1) 

(Vf  such  that  t  G  [0,  P]  :  h*  (r,  m,  s,  k,  K,  t)) 

Observe  that  the  left-hand  side  of  the  inequality  defin¬ 
ing  h*(r,  m,  s,  fc,  AT,  A)  is  a  piecewise  linear  function  of 
t  and  the  right-hand  side  of  the  inequality  defining 
h*{T,m,s,k,K,t)  is  a  linear  function  of  t.  Hence,  when 
evaluating  (Vf  such  that  f  G  l^jP]  ■  h*(T,  m,  s,  fc,  AT,  A)) 
it  is  only  necessary  to  evaluate  h*(T,  m,  s,  fc,  AT,  f)  for  the 
following  values  of  t:  (i)  values  of  t  such  that  the  derivative 
of  the  piecewise  linear  function  changes,  (ii)  t  =  P,  and 
(iii)  t  =  0.  With  respect  to  (hi),  note  that  h*  (r,  to,  s,  k,  AT,  0) 
is  true  and  hence  it  does  not  need  to  be  checked.  With 
respect  to  (ii),  note  that  h*  (r,  to,  s,  k,  AT,  P)  can  be  rewritten  as 
((Ener  <  (to— (to— l)x(fc/Ar))xs).  With  respect  to 

(i),  note  that  for  t  such  that  there  is  a  positive  integer  q'  and  a 
task  Til  €  T  such  that  t  =  (<?'  - 1)  xA  +Dii  —  (ijii  fs)  x  (K/k), 
the  above  mentioned  derivative  changes  but  this  t  is  dominated 
by  other  f:s  in  the  condition  and  hence,  this  t  does  not  need  to 
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be  checked.  Hence,  f*  {T,m,  s,  K)  is  true  if  and  only  if  there 
is  an  assignment  of  values  satisfying;  >  1  and 

Vfc  G  [1,  K]  :  wife  G  {0,  1} 

S  X  X  /~)  • 

V(i,  fc)  such  that  (r^  G  t)  A  (fc  G  [1,  /C])  :  (wife  ^  1)  ^  (r)i  <  - — - -) 

y{i' ,  q' ,  j' ,  /^  k)  such  that  (r^/  G  r)  A  {q'  G  [1,  P/T^/])  A 

U'  6  [O.nstages;/  -  1])  A  (/'  e  {0,  1})  A  (/c  e  [l,K])  :  t^,  j,  j,  = 

(q  -  1)  X  T-/  +  D-/  -  ((  ^  +  /'  X  bsp;/_j/_,.i)  X  A 

\/ {i' ,  q' ,  j' ,  f' ,  k)  such  that  (r^/  G  r)  A  G  [1,  P/T^/])  A 

(j^  G  [0,  nstages-/  —  1])  A  (f'  G  {0,  1})  A  (k  G  [1,  K])  :  (wi^  1)  ^ 

(  J2  ffdbf(n,ti,  ^,s) 

k 

<  (m  -  (m  -  1)  X  — )  X  X  s) 

Vfc  such  that  k  G  [1,  K]  :  (wife  =  1) 

((  E  <  (m  -  (m  -  1)  X  (fc/K))  X  s) 

We  will  now  rewrite  fFdbf(ri,  A,  s)  to  a  form 

closer  to  MILP.  Define  ^q'  j'  j' so  that  y  j' ^k  = 
1  if  =  g; ’otherwise  h^q^i' j' ^k  =  0. ’De¬ 

fine  ri^if^q>j>j,^k  =  ti' ,q' ,j' j' ,k  mod  Ti.  Define  aj,  j,  ,  j,  ^ 
so  that  {li^q^i!  ^q'  J!  J\k  =  1)  ,q'  ,j'  ,f'  ,k  ~  (9+1)^  Ai). 

Then,  using  (4),  rewrite  Sdht{Ti,tii ^qi  j/ ji ^k,  as: 

-  WJ(Ti,(Di  x  A  ^  s) 

We  introduce  wjf,wjsf,wjss,  and  wjt  as: 

1)  ,q' ,j' J' ,k  =  1  means  that  when  WJ(Ti,(A  — 
Ti^i' ,q' ,j' J' ,k)  X  ^is)  is  called,  the  first  case  in  the 
definition  of  WJ  is  taken;  otherwise  wjf  j  j/  j,  j/  =  0. 

2)  wjsfjj  j/ q/j/ J/ =  1  means  that  when  WJ(ri,(A  — 
‘ki,i',q',j',f',k)  X  -^,s)  is  called,  the  second  case  in  the  def¬ 
inition  of  WJ  is  taken  and  recursion  is  performed  in  WJS 
in  which  stage  j  is  the  last  entire  stage  covered  and  the 
first  case  in  (3)  is  taken;  otherwise  wjsfj  j  g,  j,  f,  /.  =  0. 

3)  wjsSjj- j/^^/ j/ J/ =  1  means  that  when  WJ(Ti,  (A  — 
‘ki,i',q',j'j',k)  X  -^js)  is  called,  the  second  case  in  the 
definition  of  WJ  is  taken  and  recursion  is  performed 
in  WJS  in  which  stage  j  is  the  last  entire  stage  cov¬ 
ered  and  the  second  case  in  (3)  is  taken;  otherwise 
^j^^ijy,q',j',f',k  =  0- 

4)  ^jti,i',q',j',f',k  =  1  means  that  when  WJ(Ti,(A  — 
Ti,i',q',j',f',k)  X  -^,s)  is  called,  the  Aird  case  in  the 
definition  of  WJ  is  taken;  otherwise  wjt,  g,  y  j/  =  0. 

Elaborating  on  this  yields  that  i*  {T,m,  s,  K)  is  true  if  and 
only  if  there  is  an  assignment  of  values  to  variables  such  that 
the  constraints  in  Fig.  3  are  satisfied.  In  Fig.  2,  Cij  denotes 
the  upper  bound  on  the  execution  requirement  of  a  segment  in 
the  j*'’’  stage  of  task  Ti  but  in  Fig.  3  cu^j  denotes  this,  (cu^  j 
means  execution  requirement  that  we  will  use.) 

IV.  Memory  Contention 

Previous  work  [13]  provided  a  method  for  computing 
an  upper  bound  on  the  response  time  of  a  task  considering 
contention  for  resources  in  the  memory  system.  That  method 
assumes  fixed-priority  preemptive  non-migrative  scheduling 
and  integrates  the  memory  contention  analysis  in  the  schedu- 
lability  analysis.  In  this  section,  we  will  adapt  this  memory 


contention  analysis  (i)  to  compute  an  upper  bound  on  the  extra 
execution  time  of  a  segment  of  a  single  job  of  a  task  and  do  it 
without  assuming  any  specific  processor  scheduling  algorithm 
and  (ii)  expressing  it  on  a  form  easily  translatable  to  MIFR 

Fet  cuiij^g  denote  an  upper  bound  on  the  execution  re¬ 
quirement  of  the  5*^  segment  of  the  stage  of  task  Ti 
considering  contention  for  resources  in  the  memory  system 
(the  extra  execution  of  this  contention  is  considered  to  be 
part  of  the  execution  requirement).  Also,  recall  that  cu^  j  was 
defined  in  the  previous  section.  We  will  now  redefine  it.  Fet 
cui  j  denote  an  upper  bound  on  the  execution  requirement  of 
a  segment  of  the  stage  of  task  Ti  considering  contention 
for  resources  in  the  memory  system  (the  extra  execution  of 
this  contention  is  considered  to  be  part  of  the  execution 
requirement).  Hence: 

cuij  =  max  (34) 

ge[l,nseg^^j-] 

(47),(48),(49)  express  (34)  as  MIFP. 

Fet  Oijn^pyy  =  1  indicate  that  the  page  with  page  index 
p  of  the  g*"  segment  of  the  stage  of  task  Ti  is  mapped 
to  a  memory  frame  with  cache  color  h  and  bank  color  6; 
otherwise  Oij^g^pyy  =  0.  Clearly,  a  page  can  only  be  mapped 
to  one  frame  and  one  frame  belongs  to  exactly  one  cache  color 
and  one  bank  color.  Hence,  each  page  belongs  to  exactly  one 
combination  of  cache  and  bank  color.  This  yields  (41).  Also, 
if  a  cache  color  and  bank  color  is  given  then  the  number  of 
pages  that  can  be  mapped  to  this  combination  of  cache  color 
and  bank  color  cannot  exceed  its  amount  of  physical  memory. 
In  order  to  express  this,  let  GSij^g^p  be  a  constant  that  indicates 
how  many  pages  maps  to  the  same  frame  as  page  p  of  the 
segment  of  the  stage  of  maps  to.  For  normal  pages, 
it  holds  that  GSij^g^p  =  1  but  if  a  page  maps  to  a  shared 
frame  then  GSij^g^p  is  larger.  GSij^g^p  can  be  computed  as 
follows.  Form  a  graph,  with  one  vertex  for  each  each  {i,j,  g,p) 
and  there  is  an  edge  between  two  vertices  {i' ,j' ,g' ,p')  and 
{i'\j'\g'\p")  if  {i',f,9',p',i'',f',9'',p")  G  sharedframes. 
Compute  the  connected  components  of  the  graph.  Then,  for 
{i,j,9,p),  let  GSVSij^g^p  denote  the  set  of  vertices  in  the 
connected  component  to  which  the  vertex  corresponding  to 
{i,j,9,p)  belong,  let  GSTSij^g^p  denote  the  set  of  tuples  that 
correspond  to  GSVSi  j  g  p.  Fet  GSi  j  g  p  denote  the  cardinality 
of  GSTS,,j- 

With  GSij^g^p,  the  limited  memory  capacity  can  be  ex¬ 
pressed  by  (42).  In  addition,  the  requirement  on  shared  frames, 
expressed  by  the  set  sharedframes,  yields  (50).  ino/j indi¬ 
cates  the  the  number  of  pages  accessed  during  initialization 
that  maps  to  frames  of  which  belong  to  cache  color  h  and 
bank  color  b. 

For  a  pair  of  segments  that  could  possibly  execute  in 
parallel,  we  require  that  the  memory  mapping  is  set  up  so  that 
one  segment  cannot  evict  a  cache  block  that  another  segment 
has  fetched  to  the  cache.  (44), (45),  and  (46)  express  that. 

Fet  mbi  j  g  f,  denote  an  upper  bound  on  the  number  of 
memory  accesses  from  the  segment  of  the  stage  of  task 
Ti  to  memory  bank  b.  This  gives  us  (38).  Fet  mmbiy  ^g/ y 
be  an  upper  bound  on  the  number  of  memory  accesses  on 
memory  bank  b  from  multiple  jobs  of  the  g'*^  segment  of  the 
j'th  Qf  these  memory  access  can  impact 

a  job  of  Ti.  (39)  expresses  it.  In  the  proof  of  Theorem  1,  we 
will  show  that  it  is  an  upper  bound.  Fet  mmhOiyji^g>y  be  an 
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nstages^ 

CUi  =  E  (nsegij  X  cui. 


etaui  ^  ^  (  r 


.nsegij 


1  X  CUi, 


cuij  nsegij  cui,j  nseg^- 

bspu^  -  =  - -  X  [ - spui  j  =  - -  X  r - -] 


j-i 

q'  ,j\  /^  k)  such  that  (r^/  G  r)  A  {q^  G  [1,  P/T^/])  A  {j^  G  [0,  nstages^/  —  1])  A  {f'  G  {0,  1})  A  (k  G  [1,  K])  : 

{wik  =  1)  ^  =  (9'  -  1)  X  T^,  +  D^i  -  ((  ^  spu^/  j//)  +  /'  X  bspu;/_j/_,.i)  x  (K/k)) 

,q'  ,j' ,  f'  ,k)  such  that  (n  G  r)  A  (r^/  G  r)  A  (g'  G  [1,  P/T^/])  A  (j'  G  [0,  nstages^/  —  1])  A  (/'  G  {0,  1})  A  (fc  G  [1,  -P])  : 

i,/  f/  i.  —  (  h  ..  „/  A  g  X  Ti)  +  ,qf  jf  jf  ,k 


'‘i' j' ,f' ,k  (  y  ^  ,9' ,j' 

ge[l,P/Tj 


y{i,i',q,j',f',k)  such  that  (r^  G  r)  A  (r^/  G  r)  A  (g^  G  [1,  P/T^/])  A  G  [0,  nstages^/  —  1])  A  (/^  G  {0,  1})  A  (fc  G  [1,  K])  : 

^  ^  1 
qG[0,P/Ti] 

V(i,  q' ,j' ,  /^  fc)  such  that  (ri  G  r)  A  (r^/  G  r)  A  (g^  G  [1,  P/T^/])  A  G  [0,  nstages^/  —  1])  A  (/^  G  {0,  1})  A  {k  G  [1,  -fC])  :  <  Ti 

V(i,  g,  q\j',  /^  fc)  such  that  (n  G  r)  A  (g  G  [0,  P/T^])  A  (r^/  G  r)  A  (g^  G  [1,  P/Tj/])  A  G  [0,  nstages^/  —  1])  A  {f'  G  {0,  1})  A  (k  G  [1,  K])  : 

(^i,q,i'  ,q'  ,j'  ,f'  ,k  ~  1)  ^  ,q'  ,j'  ,f'  ,k  —  (^  P  l)  A  CU^) 

\/{i,i\q\j\  f' ,k)  such  that  (r^  G  r)  A  (r^/  G  r)  A  (g^  G  [1,  P/Tj/])  A  G  [0,  nstages^/  —  1])  A  (/^  G  {0,  1})  A  (fc  G  [1,  i^])  : 

t  +  (  w.jsf,-  i/  f/ 1)  +  (  wjss;  ,■ ,/  ,/ 1)  +  wjti  p  „/ ,/  f/ 1  =  1 


J  ^  [0,nstages^  —  1]  j  G  [0,nstages^  —  1] 


V(E  g^  /\  /c)  such  that  (r^  G  r)  A  (r^/  G  r)  A  (g^  G  [1,  P/T^/])  A  (j^  G  [0,  nstages^/  —  1])A 
(/'  e  {0,  1})  A{kG  [1,  K])  :  i^3fi,^',q',J',f',k  ^  1)  ^  J  A  (fc/PT)  <  0) 

V(z,  i' ,  q' ,  j' ,  f' ,  k)  such  that  (r^  G  r)  A  (j  G  [0,  nstages^  —  1])  A  (r^/  G  r)  A  (g^  G  [1,  P/T^/])  A  (j^  G  [0,  nstages^/  —  1])A 
(/'  6  {0,  1})  A  (fc  G  [1. -ft:])  :  =  1)  ^  ^  sp'ii.j"  <  {Di  -  X  (k/K)  <  {  ^  spu;^^-,/ )  +  bspu^^^-^i) 

j"e[i,3l  j"e[i,jl 

V(z,  q\  j' ,  f' ,  k)  such  that  (r^  G  r)  A  (j  G  [0,  nstages^  —  1])  A  (r^/  G  r)  A  {q'  G  [1,  P/T^/])  A  {j'  G  [0,  nstages^/  —  1])A 

(/'  6  {0,  1})  A  (fc  G  [1. -ft:])  :  (wjss;_j._;/  =  1)  ^  ((  ^  +  i  <  (Di  -  j,  j,  ,^)  X  (k/ K)  <  ^  spu;_^./,) 

.j"e[i,j]  j"e[i,3  +  il 

V(E  q' ,j' ,  f  \  k)  such  that  {ti  G  t)  A  (r^/  G  r)  A  {q  G  [1,  P/T.^/])  A  {j'  G  [0,  nstages.^/  —  1])A 
(/'  6  {0,  1})  A  (fe  G  ]1,  -ft:])  :  (wjti  ^/  =  1)  ^  ^  spUi  j//  <  (,Di  -  X  (feZ-ft")) 

J  £  [1 ,  nstages^  ] 

V(i,  g^  /^  /c)  such  that  (r^  G  r)  A  (r^/  G  r)  A  (g^  G  [1,  P/T^/])  A  G  [0,  nstages^/  —  1])A 
(/'  6  {0,  1})  A  (fe  G  ]1,  -ft:])  :  =  1)  ^  =  0) 

y{i,  j,  i'  ,q  ,j' ,  f'  ,k)  such  that  {ji  G  r)  A  {j  G  [0,  nstages^  —  1])  A  (r^/  G  r)  A  {q  G  [1,  P/T^/])  A  {j'  G  [0,  nstages^/  —  1])A 
(/'  6  {0,1})  A  (fe  G  [1,K])  :  (wjsf;_j._;/  =  1)  ^  ,q' ,j' J' ,k  =  (  E  X  cu-_j-„)  + 

j"e[i,3] 

((£';- n.i',,'. 3',//, fc)  X  (fc/,ft:)  -  (  ^  spu;_^.//))  X  m  X  s) 

3"6[1,j1 

V(z,  J,  q\  j' ,  f' ,  k)  such  that  (r^  G  r)  A  (j  G  [0,  nstages^  —  1])  A  (r^/  G  r)  A  {q'  G  [1,  P/T^/])  A  {j'  G  [0,  nstages^/  —  1])A 
(/'  6  {0.  1})  A  (fe  G  ]1.  -ft:])  :  (wjss;_j._;/  =  1)  ^  ,q' ,j' ,f' ,k  =  (  E  X  cu-_j-„)  + 

j"e[i,3] 

bspu-_j_,.i  X  m  X  s  +  {{Di  -  ,,)  X  (k/K)  -  (  ^  spu.;^^-//)  -  bspu-_j_,.i)  X  (nseg-_j  mod  m)  X  s) 

3''6[l,j] 

V(E  q' ,j' ,  /^  /i;)  such  that  (r^  G  r)  A  (r^/  G  r)  A  (g^  G  [1,  P/T^/])  A  (j^  G  [0,  nstages^/  —  1])A 
(/'  6  {0,  1})  A  (fe  G  ]1,  -ft:])  :  (wjti  i/  =  1)  ^  {'f^i,i',q',j'j>,k  =  CUi) 

V{i' ,  q  ,j' ,  /^  fe)  such  that  (t^/  G  r)  A  (9^  G  ]1,  P/T^/])  A  (j'  G  ]0,  nstages^;  —  1])  A  (f'  G  {0,  1})  A  (fe  G  ]1,  -ft:])  : 

(wifc  =  1)  ^  ((  E  (^ii,i',q',j',f',k  -  '^i,i',q',ji,f>,k))  <  (m  -  (m  -  1)  X  (k/K))  X  x  s) 

T^er 

Vfc  such  that  k  G  [1,  -fC]  :  (wi^  ^  1)  ^  ((  E  cUi/Ti)  <  (m  —  (m  —  1)  X  (k/K))  x  s) 
V(i,  fc)  such  that  (r^  G  t)  A  (fc  G  [1,  PT])  :  (wife  ^  1)  ^  (etau^  <  (k/K)  X  s  X  Di) 

K 

y^  wi),  >  1 

k  =  l 

cuij  G  M>o,  CUi  €:  II^>o,  etaui  G  M>o,  bspu^  ^  G  M>o,  spu^  j  GM>o,ti/,gi,3A/Afc  ^  ^>oXi^q,i' ,qf  j'  jf  ,k  ^  {0,  1},^^  ^/  ^/  ^/  ^/  ;,,  G]R>o, 
G  G  {0,  l},wjsfi_j-_i,  G  {0.  1},  G  {0,  1},  wjt,_i,  G  {0.1}, 

'^i,i',q',j',f',k  e  K>0,  wifc  G  {0,  1} 


Fig.  3:  Schedulability  analysis  for  gEDF  scheduling  of  parallel  tasks  formulated  as  a  MILR 


(14) 

(15) 

(16) 

(17) 

(18) 

(19) 

(20) 

(21) 

(22) 

(23) 

(24) 

(25) 

(26) 


(27) 

(28) 

(29) 

(30) 

(31) 

(32) 

(33) 


6 


Fig.  4;  Contention  on  bank  queue.  The  vertical  axis  shows  an 
upper  bound  on  the  delay  that  the  memory  accesses 

from  the  segment  of  the  stage  of  task  can  suffer 
from  because  the  other  mmhoij^g^b  memory  accesses  from 
other  segments  access  memory  of  bank  b  and  hence  contend 
on  the  queue  for  memory  bank  b. 


upper  bound  on  the  number  of  memory  accesses  on  memory 
bank  b  from  multiple  jobs  of  all  other  segments  than  the 
segment  of  the  stage  of  task  such  that  these  memory 

accesses  can  impact  a  job  of  the  segment  of  the  stage 

of  task  Ti.  (40)  expresses  it. 

Now  consider  memory  contention.  Look  at  the  queues 
inside  the  memory  controller  in  Fig.  1.  Consider  the 
segment  of  the  stage  of  task  and  its  (at  most  mhij^g^b) 
memory  accesses  that  it  performs  on  memory  locations  of 
memory  bank  b.  Let  coati_^_g_f,  denote  an  upper  bound  on  the 
extra  execution  time  of  this  segment  because  other  memory 
accesses  (from  other  segments)  access  this  bank  (bank  b). 
Let  oaoij  g  b  denote  an  upper  bound  on  the  number  of  other 
memory  accesses  that  causes  extra  execution  time  of  this 
segment  because  other  memory  accesses  (from  other  segments) 
access  other  banks  (than  bank  b).  Then,  we  express  cm^ ^  g  as 

B-1 

cnii.j.g  =  Ci.j  +  s  X  (  E  (coati_j_g,i,  +  Linter  X  oao; _ j , g _ ), ) )  (35) 

b=0 

In  the  above  equation,  we  multiply  by  s  because  coati  g  g  f, 
and  Linter  X  oaoi  j-^g^f,  measure  execution  time  whereas  cm^  g  g 
measures  execution  requirement. 

We  will  now  find  expressions  for  coatij^g^b  and  oaoij^g^b- 
For  this  purpose,  look  again  at  the  queues  inside  the  memory 
controller  in  Fig.  1.  A  single  memory  access  accessing  bank 
b  can  be  delayed  by  the  following; 

1)  There  are  already  memory  accesses  in  the  queue  of  bank  b 
when  this  single  memory  access  is  inserted  in  the  queue  of 
bank  b  and  because  of  FIFO  queuing,  these  other  memory 
accesses  are  served  first. 

2)  After  this  single  memory  access  is  enqueued  in  the  queue 
for  bank  6,  there  are  other  memory  accesses  enqueued 
in  this  bank  and  these  other  memory  accesses’  row  is 
currently  loaded  in  the  row  buffer  and  hence  they  get 
ahead  in  the  queue  for  this  memory  bank  (reordering). 

3)  When  one  of  the  other  memory  accesses  mentioned  in 
1)  or  2)  reaches  the  head  of  the  queue  of  bank  b,  it  is 
not  served  immediately;  instead  it  has  to  wait  for  the 
memory  bus  being  granted  and  this  takes  time  because 
other  memory  accesses  in  the  queues  to  other  memory 
banks  than  bank  b  use  the  memory  bus. 

About  1)  and  2)  Since  we  assume  a  processor  stalls 
until  its  memory  access  has  been  completed,  it  follows  that 


from  each  processor,  there  can  be  at  most  one  outstanding 
memory  access  and  hence  there  are  at  most  LIMl  memory 
accesses  of  1)  above.  The  hardware  places  a  limit  on  the 
number  of  reorderings  that  can  happen.  In  previous  work 
[13],  we  introduced  the  parameter  to  indicate  an  upper  bound 
on  the  number  of  those  reorderings  that  a  single  memoiy 
access  can  experience.  In  this  paper,  we  let  A^re  denote  this 
parameter;  a  typical  value  [13]  is  Aj-e  =  12.  Consequently,  the 
mbij^g^f,  memory  accesses  from  the  segment  of  the 
stage  of  task  performing  on  bank  b  has  to  wait  for  at  most 
mbi_g_g_f,  X  (LIMl  +  Nj-e)  =  mbi  j  g  b  x  LIM2  other  memory 
accesses  performing  on  bank  b  (because  of  1)  and  2)  above). 
Because  mmbi_i' j^g'_6  is  an  upper  bound  on  the  number  of 
memory  accesses  from  the  segment  of  the  stage  of 
task  Til  that  can  be  performed  in  parallel  with  a  segment  of 
a  job  of  task  Ti,  assuming  that  the  job  of  task  Ti  meets  its 
deadline,  it  follows,  using  (40),  that:  The  mhij^g^b  memory 
accesses  from  the  segment  of  the  stage  of  task  Ti 
performing  on  bank  b  has  to  wait  for  at  most 

X  LIM2,  (36) 


other  memory  accesses  performing  on  bank  b  (because  of 
1)  and  2)  above).  Let  oati  j  g_{,  be  the  expression  in  (36).  (It 
means  other  accesses  to  this  bank.)  By  inspecting  Lconhit(;c) 
and  the  parameters  in  Section  II,  one  can  see  that  these  memory 
accesses  have  different  effects;  the  memory  accesses  that  are 
in  the  queue  before  a  memory  access  has  arrived  to  the  queue 
cause  more  interference  than  the  ones  that  arrive  later  that 
cause  reordering.  Fig.  4  shows  an  upper  bound. 

About  3)  A  memory  access  related  to  memory  bank  b 
is  inserted  in  the  queue  for  the  memory  bus  only  if  (i)  this 
memory  access  is  at  the  head  of  the  queue  of  the  memory 
bank  b  and  (ii)  there  is  no  memory  access  related  to  memory 
bank  b  already  in  the  queue  of  the  memory  bus.  Hence,  a 
memory  access  that  has  reached  the  head  of  the  queue  of  its 
memory  bank  needs  to  wait  for  at  most  B-l  other  memory 
accesses  until  it  is  granted  the  memory  bus.  Consequently, 
the  mbij^g^b  memory  accesses  from  the  segment  of  the 
jth  Qf  performing  on  bank  b  has  to  wait  for  at 

most  {mhij^g^b  +  oatij^g^b)  x  [B  —  1)  other  memory  accesses 
performing  on  bank  b  (because  of  3)).  Because  mmbi  i'j'^g'^b 
is  an  upper  bound  on  the  number  of  memory  accesses  from  thie 
gith  segment  of  the  stage  of  task  that  can  be  performed 
in  parallel  with  a  segment  of  a  job  of  task  Ti,  assuming  that 
the  job  of  task  Ti  meets  its  deadline,  it  follows,  using  (40), 
that:  The  mbij^g^f,  memory  accesses  from  the  segment  of 

the  stage  of  task  Ti  performing  on  bank  b  has  to  wait  for 
at  most 


min((mbi^j_g^(,  +  oatij^g^i,)  X  (B  —  1), 


(>"e[o,s-i]A(6'Vi>) 


(37) 


Other  memory  accesses  performing  on  other  banks  than  bank 
b  (because  of  3)  above). 


This  reasoning  yields  an  upper  bound  on  the  execution 
requirement  of  a  segment  on  a  form  close  to  MILP  —  see 
Fig.  5. 


V.  The  MILP  formulation 

Let  n  denote  the  computer  platform  (the  parameters  m, 
s,  H,  B  and  the  parameters  describing  the  memory  system). 
fmem(r,  H,  K)  is  a  function  which  returns  the  tuple  (flag,  o) 
where  flag  is  a  boolean  and  o  is  a  multi-dimensional  array.  If 
there  exists  an  assignment  of  values  to  the  variables  so  that 
the  constraints  in  Fig.  3  and  Fig.  5  are  satisfied  then  flag  is 
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"Pi,j  -1  H-1 

g,  b)  such  that  (n  G  r)  A  (j  G  [1,  nstagesj)  A  {g  G  [1,  nseg- j])  A  (b  G  [0,  B  —  1])  :  mhij^g^b  —  ^  ^  MAij^p  X  Oij,g,p,h,b 

p=0  h^o 

V(*,  i\  j\  g\  b)  such  that  (r^  G  r)  A  (r^/  G  r)  A  (j'  G  [1,  nstages^/])  A  {g'  G  [1,  nseg^z  j/])  A  (b  G  [0,  B  —  1])  : 

,,  =  ([A]  +  1)  X 

V(a  j,  g,  b)  such  that  (r^  G  r)  A  (j  G  [1,  nstages^])  A  (p  G  [1,  nseg^  j])  A  (b  G  [0,  B  —  1])  : 
mmboi,j,g,6  = 

j'e[l,nstages^/]  g'  G[1  ,nseg.,  j,  ]  =  i)  A(j'  =j)  A{g'  ^  g))V  (i' ^i)) 

H-1 B-1 

such  that  (t;  e  r)  A  (j  e  [l.nstagesj)  A  (g  e  [l,nseg._j])  A  (p  G  [0,  np;_g-  -  1])  :  ^  ^  Oij,g,p,h,l,  =  1 

/1=0  6=0 

V(h,  b)  such  that  {h  G  [0,  hT  -  1])  A  (b  G  [0,  B  -  1])  :  (  ^  ^  ^  ^  (— -  X  Oi,j_g_p_fe_!,))  +  inoi,_6  <  CAP 

J  G[l,nstages^]  g  ^  [1 ,  nseg^  ^  ]  pe[0,np^  ^  —1] 

E  E  ino/1^,6  —  INO 

hG[0,H-l]  be[0,B-l] 

V(a  i,  g,  h)  such  that  (n  G  r)  A  (j  G  [1,  nstagesj])  A  (5  G  [1,  nseg^  j])  A  (h  G  [0,  //  -  1])  :  ^  1)  ^  ^  ^  Oi,j,g,p,h,6  >  1) 

pe[o,npi  j-i]  be[o,B-i] 

V(z,  j,  g,  h)  such  that  (n  G  r)  A  (j  G  [1,  nstagesj)  A  (5  G  [1,  nseg- j])  A  (h  G  [0,  //  —  1])  :  {yii,j,g,h  —  0)  ^  ^  ^  Oij^g^p^h,b  <  0) 

pe[0,npi  j-l]  6€[0,B-1] 

V(z,  J,  g,i'  ,j\g' ,  h)  such  that  (n  G  r)  A  (j-'  G  [1,  nstagesj)  A  (9  G  [1,  nseg^  ^  j)  A  (r^/  G  r)  A  (j-'^  G  [1,  nstages^/])  A  (5^  G  [1,  nseg^/ ])  A 

{h  G  [0,  //  -  1])  A  (((i  =  i')  A  (j  ^  j')  A{g  <  g'))  V  {i  <  i'))  :  ^i,j,g,h  +  x-/^^/  <  1 

such  that  (xi  G  t)  A  (j  G  [1,  nstages^])  A  (5  G  [l,nseg^  ^-j)  :  cmij^g  <  cuj^j 

nsegj  J 

V(Ai)  such  that  (xj  G  x)  A  {j  G  [l,nstagesj)  :  E  seij^g  —  1 

9=1 

such  that  (xi  G  x)  A  (j  G  [l,nstagesj)  A  (5  G  [l,nseg^  j])  :  (seij^g  =  1)  {cniij^g  >  cuij) 
\/{i\  j' ,  g'  ,p  ,  i",  j" ,  g"  ,p" ,  h,  b)  such  that  {{i  ,  j' ,  g'  ,p  ,  i" ,  j" ,  g'-,p')  €  sharedframes)  A  {h  £  [0,  H  —  1])  A  {b  G  [0,  B  —  1])  : 

^ P  J g^  J  6-, 6  ^ p ^  ^ 3 ^ ^  J g^ ^  »6-,  6 


B-1 

V(2,  j,  g)  such  that  (xj  G  x)  A  (j  G  [1,  nstages^])  A  {g  G  [1,  nseg^  ^j)  :  crui  =  Cij  +  s  X  (  ^  (coati,j,g,b  +  Linter  X  oaoi,j,g,b)) 

6=0 


y{i,  j,  g,  b)  such  that  (xi  G  x)  A  {j  G  [1,  nstages^])  A  {g  G  [1,  nseg^  j])  A  (6  G  [0,  B  —  1])  :  bcli,j,g,b  +  bc2i  +  bc3-i,j,g,{,  +  hc4ij^g^b  —  1 
V(a  j,  b)  such  that  (xi  G  x)  A  {j  G  [1,  nstagesj)  A  {g  G  [1,  nseg^  j])  A  {b  G  [0,  B  —  1])  :  (bclij^g^b  =  1)  (mmboi,j,p,b  <  mhij^g^b  X  LIMl) 

V(a  j,  g,  b)  such  that  (xi  G  x)  A  {j  G  [1,  nstages^])  A  (g  G  [1,  nseg^  j])  A  (b  G  [0,  B  —  1])  :  (hc2ij^g^b  —  1)  ^ 

(mbj^j-^g^b  X  LIMl  <  mmboi,j,g^b  <  mbj^j^g^b  X  (LIMl  +  1)) 
V('i,  j,  g,  b)  such  that  (xi  G  x)  A  {j  G  [1,  nstages^])  A  (g  G  [1,  nseg^  j])  A  (b  G  [0,  B  —  1])  :  (bc3i  j,g,6  —  1)  ^ 

(mbi,j,g,b  X  (LIMl  +  1)  <  mmboi,j,p,b  <  mhij^g^b  X  LIM2) 
V(a  j,  g,  b)  such  that  (x^  G  r)  A  {j  G  [1,  nstages^])  A  {g  G  [1,  nsegj  j])  A  (b  G  [0,  B  —  1])  :  {hc4ij^g^b  —  1)  ^  (mbi  x  LIM2  <  mmboi^j^g^b) 

V(a  j,  g,  b)  such  that  (xi  G  x)  A  {j  G  [1,  nstages^])  A  (g  G  [1,  nseg^^^])  A  (b  G  [0,  B  —  1])  :  (bcl^  j^p^b  —  1)  ^ 

(coatj  J  p  b  —  J  p  b  X  (B^Qj^f  -j-  Bj^-tep)) 

V('i,  j,  g,  b)  such  that  (xi  G  x)  A  {j  G  [1,  nstages^])  A  (g  G  [1,  nseg^  ^  j)  A  (b  G  [0,  B  —  1])  :  (bc2i  j^g^b  —  1)  ^ 

(coati^j^g^b  ^  mbi  j^g^b  X  LIMl  x  (B^onf  +  Binter)  +  (mmboi  j^g^b  -  mbi  j^g^b  x  LIMl)  x  (WL  +  BL/2  +  iwR)  x  tcic) 

V('i,  J,  g,  b)  such  that  (xi  G  x)  A  {j  G  [1,  nstages^])  A  (g  G  [1,  nseg^^^])  A  (b  G  [0,  B  —  1])  :  (bc3i  j^g^b  —  1)  ^ 
(coati, j,g,b  —  mbi j,g,b  X  LIMl  x  (Bconf  +  Binter)  +  mbi,j,g,b  x  (WL  +  BL/2  +  twa)  x  tcK-\- 
{mmhoij^g^b  —  '^^i,j,g,b  X  (LIMl  +  1))  x  (WL  +  BL/2  +  twR  +  CL)  x  (1/2)  x  Ick) 


V(2,  j,  g,  b)  such  that  (x^  G  x)  A  {j  G  [1,  nstages^])  A  {g  G  [1,  nseg^^^])  A  (6  G  [0,  B  —  1])  :  (bc4i  j^g,b  =  1) 

(coat^^j^g^b  —  ^t)^^g^g^b  X  (LIMl  X  (Bf-Qj^f  -1-  Banter)  ~l”  B^-onhit  (-^re ) ) ) 
V(a  j,  g,  b)  such  that  (x^  G  x)  A  {j  G  [1,  nstagesj)  A  {g  G  [1,  nseg^  j])  A  (6  G  [0,  B  —  1])  :  (bc4i,j,g,b  =  0)  (oati^j^g^b  =  mmboj^j^g^b) 
V(B  j,  g,  b)  such  that  (x^  G  r)  A  (j  G  [1,  nstages^])  A  (g  G  [1,  nseg^  ^  j)  A  {b  G  [0,  B  —  1])  :  {hc4ij^g^b  ==  1)  ^  (oat^^g^g^b  —  mb^^g^g^b  X  LIM2) 

V(a  j,  g,  b)  such  that  (x^  G  r)  A  {j  G  [1,  nstages^])  A  {g  G  [1,  nseg^  ^j)  A  (b  G  [0,  B  —  1])  :  (bu^  j^g^b  —  0)  ^ 

(  ^  mmbO;_g_g  ,,/  <  (mbi_g_g_i,  +  oati_g_g_i,)  X  (B  -  1)) 

b'G[0,B-l]A(b'^b) 

V(a  j,  g,  b)  such  that  (x^  G  r)  A  {j  G  [1,  nstages^])  A  (g  G  [1,  nseg^^^-])  A  (b  G  [0,  B  —  1])  :  (bu^  j^g^b  —  1)  ^ 

(  ^  mmbo;_g_g  >  (mbi_g_g,i,  +  oati_g,g,i,)  X  (B  -  1)) 

b'G[0,B-l]A(b'^b) 

V(z,  j,  g,  b)  such  that  (x^  G  r)  A  {j  G  [1,  nstages^])  A  {g  G  [1,  nseg^^^-])  A  (b  G  [0,  B  —  1])  :  (bu^  j^g^b  —  0)  ^ 


(oaOi,g,g,!,  =  (  ^  mmhOijgi,,)) 

b'€[0,B-l]A(b'^b) 

V(z,  j,  g,  h)  such  that  (r^  G  r)  A  (g  G  [1,  nstages^])  A  (g  G  [1,  nseg^  g-])  A  (6  G  [0,  B  —  1])  :  (bu^  j  g  5  ^  1)  ^ 

(oaOi_g_g_i,  =  (mbi_g_g_i,  +  Oat  i  _  g  _  g  _  (,  )  X  {B  —  l)) 

mbpj_g_i,  G  Z>0,  nimb;  P  g/  g/  g  G  Z>o,  mmbOi_j_g_i,  G  Z>o,  Oi_g_g_p_h,t  £  {0,  1},  inoh^t  G  Z>o,  Xi_j_g_h  6  {0,  1},  cmij^g  G  R>o,  sei_g,g  G  {0,  1}, 
blijj,g,5  G  {0,l},b2i  g^g  (,  G  {0,l},b3i|j,gjb  G  {0,1},  b4i  g  g  5  G  {0,  l},bui  g^g  (,  G  {0,1},  coati  J  g G  R>o ,  oao^ ,j,g,b  £  Z>o ,  oat^ g  5  G  Z>o 
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Fig.  5:  Expressing  incg'eased  execution  time. 


true  and  o  is  the  values  of  the  o-variables;  otherwise  flag  is 
false  and  o  is  undefined. 

Theorem  1. 

(((flag,  o)  —  fmem(T,  IT,  K))  A  (flag  —  true))  ^ 
r  is  gEDF  schedulable  on  a  computer  with  m  processors  of  speed  s  for 
the  case  that  tasks  experience  memory  contention  and  the  memory 

mapping  is  compatible  with  o 


Proof:  If  the  theorem  is  false  then  there  exists  a  t,  to,  s,  K 
and  an  assignment  of  the  number  of  jobs  that  each  task 
generates  and  an  assignment  of  arrival  time  to  jobs  and 
execution  requirement  of  segments  and  a  schedule  such  that 
the  following  two  statements  are  true: 

1)  ((flag,  o)  =  fmem(T,  If,  K))  A  (flag  =  true) 

2)  for  the  jobset  generated  by  t  with  the  aforementioned 
assignment,  it  holds  that  gEDF  can  generate  the  afore¬ 
mentioned  schedule  and  there  is  at  least  one  job  that 
misses  its  deadline  in  this  schedule. 

For  this  schedule,  let  to  denote  the  earliest  time  when  a 
deadline  miss  occurs.  Remove  all  jobs  with  arrival  time  >  to. 
There  is  still  a  deadline  miss  at  time  to.  Let  us  now  reason 
as  follow:  For  each  job  with  absolute  deadline  >  to  such  that 
it  performs  execution  after  time  to,  do  the  following:  identify 
the  latest  stage  of  this  job  such  that  there  is  a  segment  of 
this  stage  that  performs  execution  after  to-  Then  reduce  the 
execution  of  this  segment.  Repeated  application  of  this  yields 
that  no  job  with  absolute  deadline  >  to  performs  execution 
after  time  to-  Hence,  it  holds  that:  (i)  1)  and  2)  above  are 
true,  (ii)  one  or  many  jobs  with  absolute  deadline  at  to  misses 
deadlines,  (iii)  each  job  with  absolute  deadline  <  to  meets  its 
deadline,  (iv)  all  jobs  have  arrival  times  <  to,  and  (v)  no  job 
with  absolute  deadline  >  to  performs  execution  after  time  to- 

For  each  job  with  absolute  deadline  <  to,  we  can  reason 
as  follows:  Let  Ti  denote  the  task  that  generates  the  job.  Let  A 
denote  the  arrival  time  of  this  job  and  consider  the  time  interval 
\A,  A  +  Di)  and  consider  a  task  i'  which  is  not  the  task  that 
generated  the  job  of  task  Ti.  Because  of  (iii)  and  (iv),  there  can 
be  at  most  one  job  of  task  i'  such  that  this  job  arrives  before 
A  and  it  has  execution  that  overlaps  with  [A,  A  +  Df).  Also, 
because  of  (iii),  there  can  be  at  most  [A/A'l  jobs  of  task  i' 
such  that  this  job  arrives  at  or  after  A  and  it  has  execution  that 
that  overlaps  with  [A,  A  -\-  Di). 

For  each  job  with  absolute  deadline  >  to,  we  can  reason 
as  follows:  Let  Ti  denote  the  task  that  generates  the  job.  Let  A 
denote  the  arrival  time  of  this  job  and  consider  the  time  interval 
[A,  A  +  Di)  and  consider  a  task  i'  which  is  not  the  task  that 
generated  the  job  of  task  r^.  Because  of  (iii)  and  (iv),  there  can 
be  at  most  one  job  of  task  i'  such  that  this  job  arrives  before 
A  and  it  has  execution  that  overlaps  with  [A,  A  +  Df.  Also, 
because  of  (v),  there  can  be  at  most  [Di/A'l  jobs  of  task  i' 
such  that  this  job  arrives  at  or  after  A  and  it  has  execution  that 
that  overlaps  with  [A,  A  +  Di). 

Consequently,  for  each  of  these  cases,  there  are  at  most 
(rti  + 1)  X  mhiij'^g'^b  memory  accesses  on  bank  b  of 
jobs  of  the  segment  of  the  stage  of  task  A  that 
overlaps  with  [A,  A  +  Df).  This  expression  is  the  right-hand 
side  of  the  expression  of  (39).  Hence,  there  are  at  most 
mmhiyj'^g'^b  memory  accesses  of  jobs  of  the  segment 


of  the  stage  of  task  r'  that  overlaps  with  [A,  A  +  Df). 

Since  we  know  the  values  of  uimbi^i/j'^g/^h,  using  Fig.  5 
yields  cuii  jg.  This  yields  cuij  which  provides  an  upper 
bound  on  the  execution  requirement.  Since  cu^  j  is  an  upper 
bound  on  execution  requirement  we  can  treat  the  system  as  if 
there  was  no  contention  for  resources  in  the  memory  system 
and  execution  requirements  were  given  by  cu^j.  Since  the 
constraints  in  Fig.  3  are  satisfied,  all  deadlines  are  met.  This 
contradicts  2)  above.  Hence,  the  theorem  is  correct.  ■ 

Note  that  some  of  the  constraints  mentioned  are  not  MILP 
—  they  have  binary  variables  and  logical  operators.  We  will 
discuss  this  now.  A  constraint  of  the  form  (a;  =  1)  ^  (a  =  5) 
can  be  rewritten  as:  {{x  =  1)  ^  (a  <  b))  A  {{x  =  1)  ^  (a  > 
b)).  Note  that  if  a;  is  a  variable  with  the  domain  {0,1}  and 
a  and  b  are  non-negative  real  variables  and  BIG  is  a  constant 
selected  so  that  a  <  BIG  and  b  <  BIG,  then  a  constraint 
(x  =  1)  ^  {a  <  b)  can  be  rewritten  as 

a-b  +  BIG  X  X  <  BIG  (68) 

Note  that  in  a  feasible  solution  to  Fig.  3  and  Fig.  5,  for  the 
variables  in  the  constraints  (52)-(67),  the  variable  is  at  most 

+  E  E  E  ma., „,,,,))) 

j  e  [1' .nstages^/]  g'e  [l.nseg^/  ^./]  p' e  [0,np^/  — 1] 

(69) 

Hence,  for  the  constraints  (52)-(67),  the  left-hand  side  (Ihs) 
is  at  most 

max(2  X  (B  -  1),  Lcont  -I-  Winter)  X  ((69))  (70) 

Also,  for  each  of  the  other  constraints,  the  Ihs  is  at  most 

(P  +  DMAX)  X  m  X  max(l,  s)  H  x  B  (71) 

Applying  the  rewriting  expressed  by  (68)  (and  minor  variants 
of  it),  with  BIG  =  max((70),  (71)),  yields  that  all  of  our 
constraints  can  be  converted  to  a  MILP. 


VI.  Discussion 

Recall  (from  Section  II)  that  in  general,  it  is  possible  to 
obtain  (e.g.  using  a  worst-case  execution-time  analysis  tool) 
C'ij(map)  and  MA^  ^  p(map)  but  it  is  very  expensive  to  obtain 
Cij  and  MAij  p.  This  can  be  dealt  with  by  guessing  values  of 
the  latter  and  call  the  function  fmem(r,  H,  K)  and  then  obtain 
a  new  memory  mapping  and  then  for  this  memory  mapping, 
check  whether  the  guess  was  valid.  Also,  note  that  solving  the 
MILP  produces  an  abstract  memory  mapping  o.  It  is  abstract 
because  it  does  not  specify  exactly  to  which  memory  frame 
a  page  should  be  mapped;  it  only  specifies  to  which  cache 
color  and  bank  color  a  memory  page  should  be  mapped.  We 
assume  a  method  exists  that  converts  the  abstract  mapping  o 
to  a  mapping  map  that  specifies  for  each  page  which  frame  it 
maps  to  (it  is  trivial  to  create  it).  An  algorithm  based  on  these 
ideas  is  shown  below: 


1) 

2) 

3) 

4) 

5) 

6) 
V) 
8) 
9) 

10) 

11) 

12) 

13) 

14) 

15) 


Choose  a  value  of  K  (for  example  K  =  20) 

Choose  a  value  of  maxiter  (for  example  maxiter  =  3) 

Choose  one  abstract  memory  mapping  o' 
for  iter  :=  1  to  maxiter  do 

choose  a  memory  mapping  map'  that  is  compatible  with  the 
abstract  memory  mapping  o' 

Vaj  ; 

obtain  (Ej^j(map')  and  then  assign  :=  (7jj(map') 


obtain  MAi^j^p(map')  then  assign  :=  MAj^j^(map') 

(flag,  o)  =  fmem(r,  II,  K);  in  this  call,  assume  that 
Vi,i  :  G.,  =  ■. 

if  flag  then 

choose  a  memory  mapping  map  that  is  compatible  with  the 
abstract  memory  mapping  o 
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16)  o’  :=  o 

17)  if  (Vi,i  :  Cij  (map)  <  and 

18)  ■  MAij,p(map)  <  MA?"®®®)  then 

19)  declare  SUCCESS 

20)  end  if 

21)  else 

22)  Choose  one  abstract  memory  mapping  o'  that  has  not  been 

23)  tried  before 

24)  end  if 

25)  end  for 

26)  declare  FAILURE 

Hence,  our  solution  can  be  used  in  practice. 

We  have  implemented  a  tool  based  on  this  theory  and 
tested  it  on  systems  with  4  and  8  processors.  With  this 
experimentation,  we  find  that  such  systems  can  be  analyzed 
and  configured  and  the  pessimism  is  reasonable.  See  appendix. 

VII.  Conclusions 

COTS  multicore  processors  are  the  norm  today  but  their 
use  for  hard  real-time  systems  is  challenging  because  (i)  in 
order  to  take  full  advantage  of  such  platforms  for  meeting  tight 
deadlines,  parallelization  is  necessary  and  (ii)  the  contention 
for  shared  resources  in  the  memory  system  makes  execution 
times  hard  to  predict.  In  this  paper,  we  have  developed  a 
solution  that  addresses  these  issues.  Our  main  idea  is  to 
formulate  a  MILP  that  configures  the  memory  mapping  and 
performs  schedulability  analysis. 
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